Achievement Trade-Offs and 
No Child Left Behind 



Dale Ballou 

Peabody College of Vanderbilt University 

dale.ballou@vanderbilt.edu 

Matthew G. Springer 
Peabody College of Vanderbilt University 
matthew.g.springer@vanderbilt.edu 



Working Paper 

* Please do not quote or cite without permission 



Paper presented at the NCLB: Emerging Findings Research Conference at the Urban Institute, Washington, D.C. 
on August 12, 2009. The conference was supported, in part, by the National Center for the Analysis of 
Longitudinal Data in Education Research (CALDER), funded by Grant R305A060018 to the Urban Institute 
from the Institute of Education Sciences, U.S. Department ofEducation and the National Center for 
Performance Incentives (NCPI) at Vanderbilt University. The authors thank the Northwest Evaluation 
Association for providing data for this study as well as an anonymous foundation and the federally-funded 
National Center on School Choice at Vanderbilt University for research support. They also thank Adam 
Gamoran, Steve Rivkin, and Kim Rueben for their helpful comments and insights in developing this work as 
well as seminar participants at the American Education Finance Association, American Educational Research 
Association, Association for Public Policy Analysis and Management, and Amherst College. Special thanks is due 
to Art (Xiao) Peng for his research assistance on this project. The views expressed in the paper are solely those 
of the authors and may not reflect those of the funders or supporting organizations. Any errors are attributable 
to the authors. 





Abstract 



Under the No Child Left Behind Act, states have been required to set minimum 
proficiency standards that virtually all students must meet by 2014. Sanctions of 
increasing severity are to be applied to schools that fail to meet interim targets, known as 
Adequate Yearly Progress (AYP). The authors examine the effect of this legislation using 
longitudinal, student-level test score data from seven states (N > 2,000,000) between 
2002-03 and 2005-06 school years. This paper addresses the following research 
questions: (1) Has NCLB increased achievement among lower-performing students? ; (2) 
Have these gains come at the expense of students that are already proficient or that are 
far below the proficiency target? Identification is achieved by exploiting the fact that in 
the early years of NCLB, not all grades counted for purposes of determining AYP. The 
estimate of the NCLB effect is therefore based on a comparison of outcomes in high- 
stakes vs. low-stakes years. The authors find consistent evidence of an achievement 
trade-off in the hypothesized direction, though the effects on any given student are not 
large. Unlike some other researchers, they find mixed evidence at best that students far 
below the proficient level have been harmed by NCLB; indeed, at higher grade levels 
they appear to have benefitted. Effects of NCLB on efficiency, while positive, appear to 
be modest. 




1. Introduction 



The No Child Left Behind Act of 2001 (NCLB) is the reauthorization of the nation’s omnibus 
Elementary and Secondary Education Act of 1965 (ESEA). NCLB represents a major effort by the 
federal government to improve academic performance among groups of students who have 
traditionally lagged behind. States have been required to set minimum proficiency standards in 
reading and mathematics. Sanctions of increasing severity are to be applied to schools that fail to 
demonstrate Adequate Yearly Progress (AYP), determined by the percentage of students achieving 
the state-defined performance standard. Over time the percentage of students required to meet this 
standard is ratcheted upwards, until virtually all students must score proficient or better in 2014. 

NCLB targets apply to all of a school’s students as a group, as well as to subgroups within 
the school as long as subgroups meet minimum count requirements. A school fails to make AYP if 
any of the recognized subgroups within that school fails. The main subgroups are defined on the 
basis of race/ethnicity, income (eligibility for the free- and reduced-price lunch program), disability 
(special education students), and English proficiency (English language learners). 

NCLB has been criticized for failing to enhance capacity at low-performing schools and for 
focusing narrowly on a single performance threshold rather than on gains across the spectrum of 
achievement. In order to bring performance of all students up to the prescribed minimum, it is feared 
that schools will divert a disproportionate amount of resources to those students who are particularly 
important to a school’s accountability rating. In the short-term this would consist primarily of the 
group of students near the proficiency threshold but not assured of passing it. 1 In the long-term this 
will include an ever-larger share of those students below the standard. In schools’ effort to raise 
achievement in this group, traditionally high-performing students may be neglected. In the short-run, 
students who are far below the performance threshold may also be neglected. 

1 “Near” the proficiency threshold is a relative tenn, depending on the distribution of ability within the school. 
We make this concept precise in our definition (below) of a school’s marginal student. 



1 




It has been argued that achievement trade-offs are an inevitable consequence of the design of 
NCLB, suggesting that empirical confirmation is not even required. However, the inevitability of 
trade-offs follows only if schools are operating efficiently, on the production frontier. This should 
not be taken for granted. In the absence of clear accountability public schools, like other 
organizations, are apt to perform below their operational capacity. A long-standing debate over 
“whether money matters” in public education suggests that at a minimum, public schools frequently 
fail to make efficient use of resources. There may be sufficient slack in the present educational 
system that raising the achievement of marginally-performing students will not require trade-offs in 
the form of lower achievement for others, at least in the near term. We ask, therefore, two questions: 

• Has NCLB increased achievement among lower-performing students? 

• Have achievement gains come at the expense of students that are already proficient or 
that are far below the proficiency target? 

2. Identification Strategies 

We are not the first researchers to study the distributional effects of NCLB in public schools, 
or ask similar questions about accountability systems more generally. However, by their nature 
accountability systems are typically implemented wholesale, applying to virtually all public schools 
across the board. Apart from a handful of alternative schools for exceptional needs students, or 
schools with very few students, there are no schools outside the accountability system. As a result, 
there is no natural comparison group for estimating the impact of an accountability system on 
educational outcomes. Researchers have resorted to a variety of identification strategies to make 
good this deficit. 

A. Pre- and post-accountability system comparisons 

One strategy relies on pre- and post-accountability comparisons. Neal and Schanzenbach 
(forthcoming) compared mathematics and reading test scores of Chicago Public School students 



2 




before and after the implementation of a high-stakes accountability system . 2 They found significant 
increases in mathematics and reading test scores among those around the accountability system's 
proficiency threshold, while traditionally low-performing students did not demonstrate increased 
performance. Effects on the achievement of traditionally high-performing students were mixed. 

Pre- and post-accountability comparisons suffer from drawbacks common to interrupted time 
series designs. Effects of an accountability system can be confounded with other changes occurring 
at the time the system is implemented. In addition, in many states, the data needed to evaluate the 
accountability system often does not pre-date the system, as testing on a statewide basis with public 
disclosure of the results is frequently introduced as part of the accountability program. As a result, 
either there are no pre-NCLB test data, or the effects of NCLB must be distinguished from those of a 
state accountability system launched at the same time as the testing regime. 

The effect of accountability on student achievement also may be lagged several years, as it 
takes time for teachers and schools to ascertain how the system affects them. Time is needed for 
schools and school systems to develop instructional policies to respond to the system, and even more 
time before their responses have an appreciable impact (if any) on student achievement. Still more 
time is needed for data to become available to researchers for evaluation purposes. A lagged 
response is more likely if an accountability system is phased in or if targets are ratcheted up over 
time, as with NCLB. Thus, early findings that an accountability system does not seem to be working 
must be taken with a grain of salt — it may be too soon to tell. 

B. Exploiting variation in the strength of incentives 

NCLB creates incentives that are stronger for some types of schools than others and that 
affect some students differently than others. In most states, failure to make AYP triggers sanctions 
only for schools receiving Title I funds. These incentives are weakened to the extent that a failing 

A similar approach was implemented in Krieg’s (2008) study on the distributional effect of NCLB in 
Washington. 



3 




school itself does not bear the full costs of these sanctions. However, one would expect pressure to 
be exerted on the schools that are responsible, even when the costs fall on the district. In addition, 
one would expect that as sanctions become more severe, schools will make greater effort to raise 
student achievement. 

Variation in the level of sanctions is endogenous if there is any serial correlation in the 
unobserved determinants of achievement, so that additional identification strategies are required to 
deal with the fact that a school's accountability rating depends on the performance of students in that 
school or the quality of their teachers. Attempts to remove serial correlation, say by the inclusion of 
school fixed effects in the model, tend to exacerbate measurement error. With fixed effects in the 
model, the impact of NCLB is identified from variation in achievement (relative to the school’s 
mean) that is correlated with variation in sanctions (also relative to the school’s mean). A school that 
faces sanctions as a consequence of an off year is apt to recover the next year without doing anything 
differently. This recovery leads to an upward bias on the estimated treatment effect and may be 
mistakenly interpreted as a positive response to the accountability system. 

Several researchers have relied on regression discontinuity techniques to get around the 
endogeneity problem. Rosaen, Schwartz, and Forbes (2007) detected no impact of NCLB on 
mathematics achievement in California and only a slight, positive effect in reading. Chakrabarti 
(2007) and Rouse et al. (2007) reported that public schools graded "F" under Florida's A+ 
accountability system responded to voucher threats differently from those schools graded "D." “F” 

schools significantly increased student achievement. The improvement did not come at the expense 
of high-performing peers. 

As a strategy for studying NCLB, regression discontinuity has one notable drawback. Under 
NCLB, schools that barely made AYP know they will be tested again in the future and judged against 
a standard that is rising. For such schools to behave significantly differently from schools that barely 



4 




